The cell annotated as UNS by Cano-Gamez et al. were T cells that were neither stimulated by cytokines nor had an activated TCR. To facilitate the analysis, these cells were analyzed seperatly.
Loading the necessary libraries
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# settings can be adapted individually
sc.settings.verbosity = 3
sc.logging.print_header()
sc.settings.set_figure_params(dpi = 100, format = 'png')
scanpy==1.7.2 anndata==0.7.6 umap==0.4.6 numpy==1.20.1 scipy==1.6.2 pandas==1.2.4 scikit-learn==0.24.2 statsmodels==0.12.2 python-igraph==0.9.1 louvain==0.7.0
Load preprocessed scRNA-seq data
See notebook "Data preprocessing" for this analysis part
canogamez = sc.read_h5ad("result_files/canogamez_preprocessing.h5ad") # change to your data path
# create a path to store the preprocessed file
results_file = '/canogamez_UNS.h5ad' # change to your data path
sc.tl.pca(canogamez, svd_solver = 'arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:02)
Separate the data in UNS and stimulated
# select all UNS cells
resting = canogamez.obs['cytokine.condition'] == 'UNS'
# create AnnData consisting only of UNS cells
canogamez_uns = canogamez[resting,:]
Plot PCA
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize = (20,4), gridspec_kw = {'wspace':1})
ax1_dict = sc.pl.pca(canogamez_uns, color = 'cell.type', ax = ax1, show = False, annotate_var_explained = True)
ax2_dict = sc.pl.pca(canogamez_uns, color = 'cytokine.condition', ax = ax2,
show = False,annotate_var_explained = True)
ax3_dict = sc.pl.pca(canogamez_uns, color = 'donor.id', ax = ax3, show = False, annotate_var_explained = True)
Trying to set attribute `.uns` of view, copying.
sc.pp.neighbors(canogamez_uns, n_neighbors = 10, n_pcs = 40)
sc.tl.umap(canogamez_uns)
computing neighbors
using 'X_pca' with n_pcs = 40
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:01)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:10)
sc.pl.umap(canogamez_uns, color = ['cytokine.condition', 'cell.type'])
The Louvain algorthim was chosen for clustering as it was used by Cano-Gamez et al.
! Caution: Rerunning your code will change the cluster composition due to randomness of the algorthim !
sc.tl.louvain(canogamez_uns, key_added = "louvain_1.0", random_state = 1)
running Louvain clustering
using the "louvain" package of Traag (2017)
finished: found 8 clusters and added
'louvain_1.0', the cluster labels (adata.obs, categorical) (0:00:00)
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'cell.type'])
def count_pie(anndata, clustering, category):
"""generates a data frame with counts for a specific category within
the clusters and, plots values as pie chart"""
# generate data frame with information for cluster
clusters_df = anndata.obs[str(clustering)].to_frame()
clusters_df[str(category)] = anndata.obs[str(category)]
# generate empty dataframe for counted values
number_clusters = len(np.unique(anndata.obs[str(clustering)]))
row_names = list(np.unique(anndata.obs[str(clustering)]))
row_names_long = ['cluster ' + name for name in row_names]
col_names = list(anndata.obs[str(category)].cat.categories)
df_cell_count = pd.DataFrame(0, columns=col_names,
index=row_names_long)
# fill dataframe with counts of the given categorie
for i in range(0, number_clusters):
cluster = clusters_df[str(clustering)] == str(i)
cells_cluster = clusters_df[cluster]
count_cells = cells_cluster.value_counts()
for ic, vc in count_cells.items():
df_cell_count.at['cluster ' + ic[0], ic[1]] = vc
# plot as piechart
from natsort import natsorted
df_cell_count_T = df_cell_count.T
df_cell_count_T
df_cell_count_T.reindex(natsorted(df_cell_count_T.columns, ), axis=1)
amount_plots = len(df_cell_count)
amount_cols = 4
amount_rows = int(np.ceil(amount_plots / amount_cols))
fig, axes = plt.subplots(nrows=amount_rows, ncols=amount_cols,
figsize=(15, 15))
fig.tight_layout()
for index, column in enumerate(df_cell_count_T):
current_ax = axes[index // amount_cols, index % amount_cols]
current_ax.set_title('{}'.format(column))
current_data = df_cell_count_T[column]
current_labels = list(current_data.index)
current_data = list(current_data)
current_ax.pie(current_data, labels=current_labels,
autopct='%1.1f%%', startangle=90)
current_ax.axis('equal')
return df_cell_count, plt.show()
count_pie(canogamez_uns, 'louvain_1.0', 'cell.type')
( Memory Naive cluster 0 201 1171 cluster 1 1243 66 cluster 2 174 697 cluster 3 638 47 cluster 4 578 46 cluster 5 122 117 cluster 6 98 10 cluster 7 56 5, None)
Most clusters can be clearly assigned to one particluar cell type. Only cluster 5 seems to conist of a mixture of memory and naive cells.
sc.tl.rank_genes_groups(canogamez_uns, groupby = 'louvain_1.0', method = 'wilcoxon', use_raw=True)
ranking genes
finished: added to `.uns['rank_genes_groups']`
'names', sorted np.recarray to be indexed by group ids
'scores', sorted np.recarray to be indexed by group ids
'logfoldchanges', sorted np.recarray to be indexed by group ids
'pvals', sorted np.recarray to be indexed by group ids
'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:04)
sc.tl.dendrogram(canogamez_uns, groupby = 'louvain_1.0')
using 'X_pca' with n_pcs = 50 Storing dendrogram info using `.uns['dendrogram_louvain_1.0']`
sc.pl.rank_genes_groups_matrixplot(canogamez_uns, n_genes = 3,cmap = 'bwr',
standard_scale = "var", values_to_plot = 'scores')
Mostly ribosomal proteins. No indication of cell types
Annotate naive cells
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'cell.type'], legend_loc = 'on data')
cell.type but similar expression as 0 and 2 -> assign to Tn The marker genes from literature are based on the following website: https://www.biocompare.com/Editorial-Articles/569888-A-Guide-to-T-Cell-Markers/
| T cell type / differentiation state | Marker genes mentioned in Cano-Gamez et al. | |
|---|---|---|
| central memory T cells (Tcm) | PASK | |
| effector memory T cells (Tem) | IL7R, KLRB1, TNFSF13B | |
| terminally differentiated effector cells (TEMRA) | CCL4, GZMA | |
| natural T regulatory cells (nTreg) | FOXP3, CTLA4 |
| T cell type / differentiation state | Marker genes literature | |
|---|---|---|
| T naive (Tn) | CCR7 | |
| central memory T cells (Tcm) | FAS, IL2RB, PRDM1 | |
| effector memory T cells (Tem) | CXCR3, ITGAL, CCR5, TBX21 |
TEMRA
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'GZMA', 'CCL4'])
-> markers clearly expressed in cluster 5
nTreg
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'FOXP3', 'CTLA4'])
-> marker genes expressed in cluster 6
Tn
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'CCR7'])
-> marker genes mostly expressed in cluster 0, 2 and 7
Tcm
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'FAS', 'IL2RB', 'PRDM1', 'PASK'])
-> Tcm marker expression present in cluster 3 and 1
TEM literature
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'CXCR3', 'ITGAL', 'CCR5','TBX21'])
TEM paper
sc.pl.umap(canogamez_uns, color = ['louvain_1.0', 'IL7R', 'KLRB1', 'TNFSF13B'])
-> Tem cells can be assinged to cluster 4
Create annotation
# adjust to individually identified clusters
cluster_annotation = {
'0': 'TN',
'1': 'TCM',
'2': 'TN',
'3': 'TCM',
'4': 'TEM',
'5': 'TEMRA',
'6': 'nTreg',
'7': 'TN'
}
Add annotation to data
canogamez_uns.obs['cell type'] = canogamez_uns.obs['louvain_1.0'].map(cluster_annotation).astype('category')
Plotting
sc.pl.umap(canogamez_uns, color = 'cell type', legend_loc = 'on data', title = 'Annotated UNS cells',
frameon = False, legend_fontsize = 10)
Similar annotation as by Cano-Gamez et al., see: https://cytokines.cellgeni.sanger.ac.uk/resting
canogamez_uns.write(results_file)